Goto

Collaborating Authors

 stationary measure


Uniform-in-time convergence bounds for Persistent Contrastive Divergence Algorithms

Oliva, Paul Felix Valsecchi, Akyildiz, O. Deniz, Duncan, Andrew

arXiv.org Machine Learning

We propose a continuous-time formulation of persistent contrastive divergence (PCD) for maximum likelihood estimation (MLE) of unnormalised densities. Our approach expresses PCD as a coupled, multiscale system of stochastic differential equations (SDEs), which perform optimisation of the parameter and sampling of the associated parametrised density, simultaneously. From this novel formulation, we are able to derive explicit bounds for the error between the PCD iterates and the MLE solution for the model parameter. This is made possible by deriving uniform-in-time (UiT) bounds for the difference in moments between the multiscale system and the averaged regime. An efficient implementation of the continuous-time scheme is introduced, leveraging a class of explicit, stable intregators, stochastic orthogonal Runge-Kutta Chebyshev (S-ROCK), for which we provide explicit error estimates in the long-time regime. This leads to a novel method for training energy-based models (EBMs) with explicit error guarantees.



Arrival Control in Quasi-Reversible Queueing Systems: Optimization and Reinforcement Learning

Comte, Céline, Moyal, Pascal

arXiv.org Artificial Intelligence

In this paper, we introduce a versatile scheme for optimizing the arrival rates of quasi-reversible queueing systems. We first propose an alternative definition of quasi-reversibility that encompasses reversibility and highlights the importance of the definition of customer classes. In a second time, we introduce balanced arrival control policies, which generalize the notion of balanced arrival rates introduced in the context of Whittle networks, to the much broader class of quasi-reversible queueing systems. We prove that supplementing a quasi-reversible queueing system with a balanced arrival-control policy preserves the quasi-reversibility, and we specify the form of the stationary measures. We revisit two canonical examples of quasi-reversible queueing systems, Whittle networks and order-independent queues. Lastly, we focus on the problem of admission control and leverage our results in the frameworks of optimization and reinforcement learning.


Kinetic Interacting Particle Langevin Monte Carlo

Oliva, Paul Felix Valsecchi, Akyildiz, O. Deniz

arXiv.org Machine Learning

This paper introduces and analyses interacting underdamped Langevin algorithms, termed Kinetic Interacting Particle Langevin Monte Carlo (KIPLMC) methods, for statistical inference in latent variable models. We propose a diffusion process that evolves jointly in the space of parameters and latent variables and exploit the fact that the stationary distribution of this diffusion concentrates around the maximum marginal likelihood estimate of the parameters. We then provide two explicit discretisations of this diffusion as practical algorithms to estimate parameters of statistical models. For each algorithm, we obtain nonasymptotic rates of convergence for the case where the joint log-likelihood is strongly concave with respect to latent variables and parameters. In particular, we provide convergence analysis for the diffusion together with the discretisation error, providing convergence rate estimates for the algorithms in Wasserstein-2 distance. To demonstrate the utility of the introduced methodology, we provide numerical experiments that demonstrate the effectiveness of the proposed diffusion for statistical inference and the stability of the numerical integrators utilised for discretisation. Our setting covers a broad number of applications, including unsupervised learning, statistical inference, and inverse problems.


Learning Optimal Admission Control in Partially Observable Queueing Networks

Anselmi, Jonatha, Gaujal, Bruno, Rebuffi, Louis-Sébastien

arXiv.org Artificial Intelligence

We present an efficient reinforcement learning algorithm that learns the optimal admission control policy in a partially observable queueing network. Specifically, only the arrival and departure times from the network are observable, and optimality refers to the average holding/rejection cost in infinite horizon. While reinforcement learning in Partially Observable Markov Decision Processes (POMDP) is prohibitively expensive in general, we show that our algorithm has a regret that only depends sub-linearly on the maximal number of jobs in the network, $S$. In particular, in contrast with existing regret analyses, our regret bound does not depend on the diameter of the underlying Markov Decision Process (MDP), which in most queueing systems is at least exponential in $S$. The novelty of our approach is to leverage Norton's equivalent theorem for closed product-form queueing networks and an efficient reinforcement learning algorithm for MDPs with the structure of birth-and-death processes.


Reinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State Space

Anselmi, Jonatha, Gaujal, Bruno, Rebuffi, Louis-Sébastien

arXiv.org Artificial Intelligence

In this paper, we revisit the regret of undiscounted reinforcement learning in MDPs with a birth and death structure. Specifically, we consider a controlled queue with impatient jobs and the main objective is to optimize a trade-off between energy consumption and user-perceived performance. Within this setting, the \emph{diameter} $D$ of the MDP is $\Omega(S^S)$, where $S$ is the number of states. Therefore, the existing lower and upper bounds on the regret at time$T$, of order $O(\sqrt{DSAT})$ for MDPs with $S$ states and $A$ actions, may suggest that reinforcement learning is inefficient here. In our main result however, we exploit the structure of our MDPs to show that the regret of a slightly-tweaked version of the classical learning algorithm {\sc Ucrl2} is in fact upper bounded by $\tilde{\mathcal{O}}(\sqrt{E_2AT})$ where $E_2$ is related to the weighted second moment of the stationary measure of a reference policy. Importantly, $E_2$ is bounded independently of $S$. Thus, our bound is asymptotically independent of the number of states and of the diameter. This result is based on a careful study of the number of visits performed by the learning algorithm to the states of the MDP, which is highly non-uniform.


A Polynomial Time MCMC Method for Sampling from Continuous DPPs

Gharan, Shayan Oveis, Rezaei, Alireza

arXiv.org Machine Learning

The notion of DPP was first introduced by [Mac75] to model fermions. Since then, They have been extensively studied, and efficient algorithms have been discovered for tasks like sampling from DPPs [LJS15, DR10, AGR16], marginalization [BR05], and learning [GKFT14, UBMR17] them (in the discrete domain). In machine learning they are mainly used to solve problems where selecting a diverse set of objects is preferred since they offer negative correlation. To get intuition about why they are good models to capture diversity, suppose each row of the gram matrix associated with the kernel is a feature vector representing an item. It means the probability of a set of items is proportional to the square of the volume of the space spanned the vectors representing items. Therefore, larger volume shows those items are more spread which resembles diversity.